A Constant Approximation for Streaming k-means
نویسنده
چکیده
This article gives a constant factor approximation algorithm for streaming k-means that usesO(k log n) space.
منابع مشابه
Streaming k-means on Well-Clusterable Data
One of the central problems in data-analysis is k-means clustering. In recent years, considerable attention in the literature addressed the streaming variant of this problem, culminating in a series of results (Har-Peled and Mazumdar; Frahling and Sohler; Frahling, Monemizadeh, and Sohler; Chen) that produced a (1 + ε)approximation for k-means clustering in the streaming setting. Unfortunately,...
متن کاملTurning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering
We prove that the sum of the squared Euclidean distances from the n rows of an n×d matrix A to any compact set that is spanned by k vectors in R can be approximated up to (1+ε)-factor, for an arbitrary small ε > 0, using the O(k/ε)-rank approximation of A and a constant. This implies, for example, that the optimal k-means clustering of the rows of A is (1+ε)approximated by an optimal k-means cl...
متن کاملStreaming k-means approximation
We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. We make no assumptions about the data, and our algorithm is very light-weight in terms of memory, and computation. This setting is applicable to unsupervised learning on massive data sets, or resource-constrained devices. The two main ingredients of our theoretical work are: ...
متن کاملStreaming Algorithms for k-Center Clustering with Outliers and with Anonymity
Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the k-cent...
متن کاملBetter Streaming Algorithms for the Maximum Coverage Problem
We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe {1, · · · , n}, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor 1− 1/e in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The ...
متن کامل